ANN with MNIST


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. What's an MNIST?

From Wikipedia

  • The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, NIST's complete dataset was too hard.
  • MNIST (Mixed National Institute of Standards and Technology database) database
    • Handwritten digit database
    • $28 \times 28$ gray scaled image
    • Flattened matrix into a vector of $28 \times 28 = 784$



More here

We will be using MNIST to create a Multinomial Classifier that can detect if the MNIST image shown is a member of class 0,1,2,3,4,5,6,7,8 or 9. Susinctly, we're teaching a computer to recognize hand written digets.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

Let's download and load the dataset.

In [2]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
In [3]:
print ("The training data set is:\n")
print (train_x.shape)
print (train_y.shape)
The training data set is:

(60000, 28, 28)
(60000,)
In [4]:
print ("The test data set is:")
print (test_x.shape)
print (test_y.shape)
The test data set is:
(10000, 28, 28)
(10000,)

Display a few random samples from it:

In [5]:
# well, that's not a picture (or image), it's an array.

train_x[5].shape
Out[5]:
(28, 28)

You might think the training set is made up of 28 $\times$28 grayscale images of handwritten digits. No !!!

The thing is, the image has been flattened. These are 28x28 images that have been flattened into a 1D array. Let's reshape one.

In [6]:
img = np.reshape(train_x[5], (28,28))
In [7]:
img = train_x[5].reshape(28,28)
In [8]:
# So now we have a 28x28 matrix, where each element is an intensity level from 0 to 1.
img.shape
Out[8]:
(28, 28)

Let's visualize what some of these images and their corresponding training labels look like.

In [9]:
plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
In [10]:
train_y[5]
Out[10]:
2

2. ANN with TensorFlow

  • Feed a gray image to ANN


  • Our network model



- Network training (learning) $$\omega:= \omega - \alpha \nabla_{\omega} \left( h_{\omega} \left(x^{(i)}\right),y^{(i)}\right)$$

2.1. Import Library

In [11]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

2.2. Load MNIST Data

  • Download MNIST data from tensorflow tutorial example
In [12]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
In [13]:
img = train_x[5].reshape(28,28)

plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()

2.3. Define an ANN Structure

  • Input size
  • Hidden layer size
  • The number of classes


2.4. Define Weights, Biases, and Placeholder

  • Define parameters based on predefined layer size
  • Initialize with normal distribution with $\mu = 0$ and $\sigma = 0.1$

2.5. Build a Model

First, the layer performs several matrix multiplication to produce a set of linear activations



$$y_j = \left(\sum\limits_i \omega_{ij}x_i\right) + b_j$$$$\mathcal{y} = \omega^T \mathcal{x} + \mathcal{b}$$


Second, each linear activation is running through a nonlinear activation function




Third, predict values with an affine transformation



In [14]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape = (28, 28)),
    tf.keras.layers.Dense(units = 100, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])
WARNING:tensorflow:From c:\users\seungchul lee\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

2.6. Define Loss and Optimizer

Loss

  • This defines how we measure how accurate the model is during training. As was covered in lecture, during training we want to minimize this function, which will "steer" the model in the right direction.
  • Classification: Cross entropy
    • Equivalent to apply logistic regression
$$ -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(h_{\theta}\left(x^{(i)}\right)) + (1-y^{(i)})\log(1-h_{\theta}\left(x^{(i)}\right)) $$

Optimizer

  • This defines how the model is updated based on the data it sees and its loss function.
  • AdamOptimizer: the most popular optimizer

2.7. Define Optimization Configuration and Then Optimize




  • Define parameters for training ANN
    • n_batch: batch size for mini-batch gradient descent
    • n_iter: the number of iteration steps per epoch
    • n_epoch: iteration over the entire x and y data provided
  • Metrics
    • Here we can define metrics used to monitor the training and testing steps. In this example, we'll look at the accuracy, the fraction of the images that are correctly classified.

Initializer

  • Initialize all the variables
In [15]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [16]:
# Train Model

loss = model.fit(train_x, train_y, epochs = 5)
Epoch 1/5
60000/60000 [==============================] - 2s 31us/sample - loss: 0.2745 - acc: 0.9220
Epoch 2/5
60000/60000 [==============================] - 2s 29us/sample - loss: 0.1255 - acc: 0.9636
Epoch 3/5
60000/60000 [==============================] - 2s 28us/sample - loss: 0.0867 - acc: 0.9737
Epoch 4/5
60000/60000 [==============================] - 2s 28us/sample - loss: 0.0653 - acc: 0.9801
Epoch 5/5
60000/60000 [==============================] - 2s 29us/sample - loss: 0.0527 - acc: 0.9836
In [17]:
# Evaluate Test Data

test_loss, test_acc = model.evaluate(test_x, test_y)
10000/10000 [==============================] - 0s 18us/sample - loss: 0.0769 - acc: 0.9749

2.8. Test or Evaluate

In [18]:
test_img = test_x[np.random.choice(test_x.shape[0], 1)]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))
c:\users\seungchul lee\appdata\local\programs\python\python36\lib\site-packages\ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':
Prediction : 8

You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of overfitting, when a machine learning model performs worse on new data than on its training data.

What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...

$\Rightarrow$ As we saw in lecture, convolutional neural networks (CNNs) are particularly well-suited for a variety of tasks in computer vision, and have achieved near-perfect accuracies on the MNIST dataset. We will build a CNN and ultimately output a probability distribution over the 10 digit classes (0-9) in the next lectures.

In [19]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')